Is the p-value properly interpreted by critical care professionals? Online survey

Objective To determine the prevalence of and risk factors for insufficient knowledge related to p-values among critical care physicians and respiratory therapists in Argentina. Methods This cross-sectional online survey contained 25 questions about respondents’ characteristics, self-perception and p-value knowledge (theory and practice). Descriptive and multivariable logistic regression analyses were conducted. Results Three hundred seventy-six respondents were analyzed. Two hundred thirty-seven respondents (63.1%) did not know about p-values. According to the multivariable logistic regression analysis, a lack of training on scientific research methodology (adjusted OR 2.50; 95%CI 1.37 - 4.53; p = 0.003) and the amount of reading (< 6 scientific articles per year; adjusted OR 3.27; 95%CI 1.67 - 6.40; p = 0.001) were found to be independently associated with the respondents’ lack of p-value knowledge. Conclusion The prevalence of insufficient knowledge regarding p-values among critical care physicians and respiratory therapists in Argentina was 63%. A lack of training on scientific research methodology and the amount of reading (< 6 scientific articles per year) were found to be independently associated with the respondents’ lack of p-value knowledge.


INTRODUCTION
Healthcare professionals must rely on updated clinical information to practice evidence-based medicine (EBM). (1) To address their clinical questions, healthcare professionals need to critically appraise the design and procedure of the studies and interpret the results. (2) Null hypothesis (H0) significance testing based on p-values-indicators used to reject or not reject null hypothesesis the primary technique for drawing conclusions from data in many health disciplines. (3) Several survey studies have demonstrated that a large number of healthcare professionals are unable to understand and interpret statistical results appropriately. (4)(5)(6)(7) Horton et al. reported that many health professionals have increased difficulty because increasingly complicated statistical methods are being reported in the medical literature, and thus, these professionals may be able to understand the analysis and interpretation of results in only 21% of research articles. (8) Informally, a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two groups being compared) would be equal to or more extreme than its observed value. (9) The most common misconceptions about the p-value are the inverse probability fallacy, replication fallacy, clinical or practical significance fallacy, and effect size fallacy. (10)(11)(12)(13) The "inverse probability fallacy" is the false belief that the p-value indicates the probability that H0 is true, given certain results [P (H0/results)]. Essentially, it means confusing the probability of the result, assuming that the null hypothesis is true [P (results/H0)], with the probability of the null hypothesis, given certain data [P (H0/results)]. (14) The second misconception is called the "replication fallacy", which is the belief that the p-value is the degree of replicability of the result, and its complement, 1-p, is frequently misinterpreted as the probability a result will be replicated. (10,13) That is, the belief that result with a p-value of 0.05 means that 95 times out of 100, the statistically significant results obtained in a study will be the same in future research. (15) However, p-values provide only very little information about what is likely to happen upon replication, and they may differ upon replication simply because of sampling variability. (3) The false belief that the p-value provides direct information about the effect size is called the "effect size" fallacy. (16) Researchers believe that the smaller the p-value is, the larger the effect size is. (12,17) The last misconception is called the "clinical or practical significance" fallacy, which relates statistical significance to the importance of the effect size. (10) A statistically significant result, however, may lack clinical significance, and vice versa; therefore, the clinical or practical significance of the findings should be described by an expert in the field and not presented by statistics alone. (11) Despite the important role played by statistical interpretation and critical appraisal of published studies in the practice of EBM, there is not enough evidence regarding critical care professionals' knowledge of the topic.
The objective of this study was to determine the prevalence of and risk factors for insufficient p-value knowledge among critical care physicians and respiratory therapists in Argentina.

METHODS
This is an observational cross-sectional survey study conducted between August 30 and November 30, 2018. Informed Consent was not required since participation was voluntary and anonymous. The protocol study was approved by the Hospital Nacional Profesor Alejandro Posadas Ethics Committee (312 EmnPeS0/19).
We included healthcare professionals in the field of cardiorespiratory care in our analysis. Professionals not working in Argentina and those who quit the survey before section B (filter question) were excluded from our analyses.

Pilot testing
Before the study, a pilot test was conducted to assess the viability and feasibility of the survey. The survey was administered to 42 healthcare professionals, and the time required to answer the questions was recorded. We also asked each professional to report whether the survey, or a specific question, presented any difficulties. Forty (95.2%) respondents stated that the survey was clear and that they understood its objective. Thirtyseven (88.1%) understood all the questions. Three participants had difficulties with question 19, and two participants had difficulties with question 5. With respect to the degree of difficulty, seven respondents (16.7%) considered that the survey was very easy; five (11.9%) considered the survey easy; 13 (31%) considered the survey moderate; 15 (35.7%) considered the survey difficult; and two (2.4%) considered the survey very difficult. The median time to respond to the survey was 6.5 (5 -8) minutes.

Data collection
Through convenience and no probabilistic sampling, professionals were invited to participate via email and social networks. The invitation included the objective of the study and a link to access the survey online through the SurveyMonkey® tool (https://es.surveymonkey.com/r/ valorp).

Instrument
The survey contained 25 questions divided into three sections (Appendix 1).
The first section (A) consisted of 13 nominal and ordinal questions about the respondents' professional characteristics, such as background, academic education, and experience in scientific reading.
The second section (B) consisted of one nominal dichotomous (Yes/No) question pertaining to the respondents' self-perception about their p-value knowledge. If the answer was negative, the survey ended.
Finally, the third section (C) consisted of 11 nominal (T/F) questions (True/False/Do not know) about p-values: 6 theory questions, 4 practice interpretation questions, and 1 definition question (Appendix 1).
Questions in section C were administered in a random order.

Primary outcome measure
The lack of p-value knowledge was the main outcome of the study. Respondents who stated they did not know about p-values (a "no" answer to the section B question) or those who did not reach the required score in any of the two categories (theory or practice) were considered "unknowledgeable about p-values". Those who quit the survey in section C without reaching the required threshold in at least one of the two categories (theory or practice) were also considered "unknowledgeable about p-values".

Statistical analysis
Categorical variables are presented as numbers and percentages. Continuous variables with a normal distribution are presented as the mean and standard deviation. Nonnormally distributed variables are presented as medians and interquartile ranges. The distribution of continuous variables was assessed using the Kolmogorov-Smirnov test.
The test for a difference in proportions was performed to compare nominal variables between categories.
The main outcome was lack of p-value knowledge (theory and/or practice). P-value knowledge questions were grouped into theory questions (15, 16, 19, 20, 21, and 22) and practice questions (17, 18, 23, and 24). The respondent was considered to have sufficient theoretical knowledge if at least 4 out of 6 theory questions (67%) were correctly answered. The respondent was considered to have sufficient practical knowledge if at least 3 out of 4 practice questions (75%) were correctly answered. The respondent was considered to know about p-values if the required score was reached in either of the two categories.
The associations between p-value knowledge and other variables were determined via univariate analysis. The odds ratios (OR) and their corresponding 95% confidence intervals (95%CI) were reported. Variables with a p-value < 0.15 were included in the multivariable logistic regression model to identify those that were independently associated with p-value knowledge. A backward conditional stepwise (Wald) method was used. A p-value < 0.05 was considered significant. Statistical analysis was performed using IBM Statistical Package for Social Sciences (SPSS), v. 22.0, software for Macintosh (IBM Corp., Armonk, NY, United States).

RESULTS
A total of 896 surveys were collected; 520 were excluded because the eligibility criteria were not met.
A total of 376 surveys were analyzed: 210 (55.9%) participants were physicians, and 166 (44.1%) were respiratory therapists. The characteristics of the sample are detailed in table 1. Only 139 (37.0%) respondents answered the p-value questions satisfactorily (at least 4 correct theory responses and/or 3 correct practice responses). Results expressed as median (interquartile range) or n (%).
Two hundred thirty-seven respondents did not understand p-values [63.1% (95%CI 58.0% -67.7%)]. Of these respondents, 47 (12.5%) reported that they did not understand p-values, and 190 (50.5%) reported that they did understand p-values even though they did not reach the cutoff scores for either of the knowledge categories (theory and practice). The results of sections B and C (questions 14 through 24) are summarized in table 2. Respondents' self-assessment regarding "critical appraisal of a scientific article" and its association with the overall survey result (understanding or not understanding p-values) are detailed in figure 1 (p < 0.001). Differences were only observed between the respondents who understood p-values (the highest scores) and all other participants as well as between those who did not understand p-values (the lowest scores of the scale) and all other participants (p = 0.019 and p = 0.005, respectively).
In question 25, respondents had to choose the correct p-value definition (item c, "both options are correct"). Only 104 of 376 respondents (27.6%) answered this item correctly.
The univariate and multivariable binary logistic regression models are detailed in table 3. According to the multivariable logistic regression analysis, a lack of training on scientific research methodology (adjusted OR 2.50 [95%CI 1.37 -4.53], p = 0.003) and the amount of reading (< 6 scientific articles per year) (adjusted OR 3.27 [95%CI 1.67 -6.40], p = 0.001) were found to be independently associated with the respondents' lack of p-value knowledge.

DISCUSSION
Our main finding was a high prevalence of insufficient p-value knowledge among critical care physicians and respiratory therapists. These findings are in line with the results of prior studies. Such results revealed that a high percentage of healthcare professionals experienced difficulties in understanding and interpreting p-values. (18)(19)(20)(21) According to a survey conducted by Badenes-Ribera et al. among Spanish psychology professors, many university professors did not know how to correctly interpret p-values. (22) The authors conducted a similar survey among Italian and Chilean psychology university students and observed that a percentage of the respondents were not able to interpret p-values. (14) Msaouel et al. performed a multi-institutional survey of Greek medical residents about basic statistical concepts. (20) The results showed that a large number of medical residents were unable to correctly interpret the concepts that are commonly found in the medical literature. Susarla and Redett also assessed the knowledge, attitudes and confidence with biostatistics in a similar population. (23) The authors concluded that residents place a high degree of importance on biostatistics knowledge, but they have only a fair understanding of core statistical concepts.
In accordance with our study, two factors were found to be independently associated with the respondents' lack of p-value knowledge: a lack of training on scientific research methodology and the amount of reading (< 6 scientific articles per year). These results are consistent with the literature. (24) In our study, we also noticed that being trained in research methodology does not prevent professionals from misinterpreting p-values. The assumption that training prevents incorrect interpretations is a false belief that could be spread among less experienced or trainee colleagues. (25) A study that assessed medical residents' attitudes and confidence with epidemiology and biostatistics concluded that being trained in biostatistics and reading a higher number of journals in statistics and epidemiology on a monthly basis were associated with a positive attitude towards biostatistics and increased confidence with statistical concepts. (23) Similarly, our results indicates that professionals who read more than 6 scientific articles per year had higher levels of p-value knowledge.
The lack of p-value knowledge was more prevalent with respect to theoretical knowledge than practical knowledge. This may be because when healthcare professionals read scientific articles, they do not usually apply a sine qua non probabilistic interpretation of p-values. Such results only require the reader to routinely apply the p < alpha rule. Therefore, statistical interpretation is only based on the valuation of the p-value compared to the alpha value. (26) This presumption seems to be based on the results obtained for the question about p-values. Although there was no statistically significant difference between the professionals who understood p-values and those who did not, a high number of respondents could not provide a correct definition. (9,27) Respondents' self-assessment regarding critical appraisal should be highlighted. Respondents who reported having remarkable critical appraisal skills (five points) responded to the survey correctly. Likewise, respondents who reported having poor critical appraisal skills (one point) also showed low levels of p-value knowledge. However, it is noteworthy that a large percentage of respondents who reported high critical appraisal skills (three or four points) failed to reach the cutoff scores for p-value knowledgeable. This finding could be due to the existing contradiction between poor training in statistics and the oversized importance placed on the p-value in medical publications.
The most common misconceptions of the p-value are the "fallacies" that may seriously jeopardize the correct interpretation of results. (10)(11)(12)(13) In agreement with our results, Msaouel et al. also observed that medical residents are especially prone to the gambler fallacy bias. This is caused by the erroneous belief according to which an event is more likely to occur if it has not previously occurred and vice versa. This bias may undermine clinical judgment and medical decision making. (20) P-values may be misinterpreted due to multiple factors, such as the results and publication biases observed in the literature. Results bias is the phenomenon of authors reporting only satisfactory results. On the other hand, publication bias is the phenomenon of scientific journals accepting only articles with statistically significant results and rejecting articles with nonsignificant results. (28)(29)(30)(31) More than 12% of the respondents reported that they did not understand p-values. This probably indicates that some professionals do not read scientific articles. It is therefore necessary to improve training in this field to ensure highquality knowledge. (25) Proper systematic training in biostatistics is required to debias professionals and ensure that they are proficient in understanding and communicating statistical information. (20) This study has some limitations. First, those respondents who quit in section C were considered to "lack p-value knowledge". Therefore, we might have overestimated the prevalence of insufficient knowledge, since these respondents may have finished the survey and reached the cutoff scores for p-value knowledge. Similarly, we have considered participants who answered negatively to question 14 to "lack p-value knowledge" without allowing them to continue with the questions in section C. The reason for excluding these participants was justified because it could have resulted in a greater number of dropouts and incomplete answers due to the survey length and the possibility of participants providing random answers just to complete the survey. This could be another factor that jeopardizes the validity of the "lack of p-value knowledge" estimate.
Second, we arbitrarily grouped questions into theory and practice knowledge and arbitrarily determined the cutoff scores to define a lack of knowledge. However, even if the questions were posed by the authors, the content assessed in each of them was based on prior studies. (14,22) Moreover, to avoid random responses, we added the option "I do not know". Another limitation of this study is the fact that we did not use a validated instrument, but to minimize this limitation, we conducted a pilot test in which virtually 90% of the respondents answered that they understood all the questions. This is the first study to report the level of p-value knowledge among critical care physicians and respiratory therapists in Argentina. According to the results, we consider that training in critical appraisal should be included in the curricula of first-degree programs, with specialization in scientific literature reading and interpretation. Furthermore, healthcare professors should encourage their students to attend and participate in scientific activities.

CONCLUSION
The overall prevalence of insufficient p-value knowledge among critical care physicians and respiratory therapists in Argentina was 63%. Two factors were found to be independently associated with the respondents' lack of p-value knowledge: a lack of training on scientific research methodology and the amount of reading (< 6 scientific articles per year).

C. P-value questions
15. The p-value is a probability a) True. b) False. c) I do not know.
Imagine we are conducting a study in which two treatment groups are being compared, and we define a p-value < 0.05 (type I error or α < 5%) as "statistical significance". Based on this premise, mark whether the following statements about p-value interpretation are true or false.  24. The p-value observed in our study was significant (p = 0.02). This confirms that the effect of the treatment was higher than that observed in a similar study with a p-value = 0.04. a) True. b) False. c) I do not know. 25. Which of the following statements is the definition of the p-value? a) The p-value is the probability of obtaining a result that is equal to, or more extreme than, the result observed.
b) The p-value indicates to what degree data are not consistent with the null hypothesis. c) Both options are correct. d) I do not know. 10. How many scientific articles have you aproximately read in the last year? a) I have not read any scientific articles in the last year. b) 1 to 5. c) 6 to 12. d) More than 12.
11. In your opinion, do you think that language is a barrier to reading scientific articles? a) Yes. b) No.